{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Machine Learning for Finance Freestyle\n",
    "\n",
    "In this lab you'll be given the opportunity to apply everything you have learned to build a trading strategy for SP500 stocks. First, let's introduce the dataset you'll be using."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The Data\n",
    "\n",
    "Use BigQuery's magic function to pull data as follows:\n",
    "\n",
    "    Dataset Name: ml4f\n",
    "    Table Name: percent_change_sp500\n",
    "\n",
    "The following query will pull 10 rows of data from the table:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bigquery df\n",
    "SELECT \n",
    "    *\n",
    "FROM\n",
    "    `cloud-training-prod-bucket.ml4f.percent_change_sp500`\n",
    "LIMIT\n",
    "    10"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see, the table contains daily open and close data for SP500 stocks. The table also contains some features that have been generated for you using [navigation functions](https://cloud.google.com/bigquery/docs/reference/standard-sql/navigation_functions) and [analytic functions](https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts). Let's dig into the schema a bit more. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>table_catalog</th>\n",
       "      <th>table_schema</th>\n",
       "      <th>table_name</th>\n",
       "      <th>column_name</th>\n",
       "      <th>ordinal_position</th>\n",
       "      <th>is_nullable</th>\n",
       "      <th>data_type</th>\n",
       "      <th>is_hidden</th>\n",
       "      <th>is_system_defined</th>\n",
       "      <th>is_partitioning_column</th>\n",
       "      <th>clustering_ordinal_position</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>cloud-training-prod-bucket</td>\n",
       "      <td>ml4f</td>\n",
       "      <td>percent_change_sp500</td>\n",
       "      <td>symbol</td>\n",
       "      <td>1</td>\n",
       "      <td>YES</td>\n",
       "      <td>STRING</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>cloud-training-prod-bucket</td>\n",
       "      <td>ml4f</td>\n",
       "      <td>percent_change_sp500</td>\n",
       "      <td>Date</td>\n",
       "      <td>2</td>\n",
       "      <td>YES</td>\n",
       "      <td>DATE</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>cloud-training-prod-bucket</td>\n",
       "      <td>ml4f</td>\n",
       "      <td>percent_change_sp500</td>\n",
       "      <td>Open</td>\n",
       "      <td>3</td>\n",
       "      <td>YES</td>\n",
       "      <td>FLOAT64</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>cloud-training-prod-bucket</td>\n",
       "      <td>ml4f</td>\n",
       "      <td>percent_change_sp500</td>\n",
       "      <td>Close</td>\n",
       "      <td>4</td>\n",
       "      <td>YES</td>\n",
       "      <td>FLOAT64</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>cloud-training-prod-bucket</td>\n",
       "      <td>ml4f</td>\n",
       "      <td>percent_change_sp500</td>\n",
       "      <td>tomorrow_close</td>\n",
       "      <td>5</td>\n",
       "      <td>YES</td>\n",
       "      <td>FLOAT64</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>cloud-training-prod-bucket</td>\n",
       "      <td>ml4f</td>\n",
       "      <td>percent_change_sp500</td>\n",
       "      <td>tomo_close_m_close</td>\n",
       "      <td>6</td>\n",
       "      <td>YES</td>\n",
       "      <td>FLOAT64</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>cloud-training-prod-bucket</td>\n",
       "      <td>ml4f</td>\n",
       "      <td>percent_change_sp500</td>\n",
       "      <td>close_MIN_prior_5_days</td>\n",
       "      <td>7</td>\n",
       "      <td>YES</td>\n",
       "      <td>FLOAT64</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>cloud-training-prod-bucket</td>\n",
       "      <td>ml4f</td>\n",
       "      <td>percent_change_sp500</td>\n",
       "      <td>close_MIN_prior_20_days</td>\n",
       "      <td>8</td>\n",
       "      <td>YES</td>\n",
       "      <td>FLOAT64</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>cloud-training-prod-bucket</td>\n",
       "      <td>ml4f</td>\n",
       "      <td>percent_change_sp500</td>\n",
       "      <td>close_MIN_prior_260_days</td>\n",
       "      <td>9</td>\n",
       "      <td>YES</td>\n",
       "      <td>FLOAT64</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>cloud-training-prod-bucket</td>\n",
       "      <td>ml4f</td>\n",
       "      <td>percent_change_sp500</td>\n",
       "      <td>close_MAX_prior_5_days</td>\n",
       "      <td>10</td>\n",
       "      <td>YES</td>\n",
       "      <td>FLOAT64</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>cloud-training-prod-bucket</td>\n",
       "      <td>ml4f</td>\n",
       "      <td>percent_change_sp500</td>\n",
       "      <td>close_MAX_prior_20_days</td>\n",
       "      <td>11</td>\n",
       "      <td>YES</td>\n",
       "      <td>FLOAT64</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>cloud-training-prod-bucket</td>\n",
       "      <td>ml4f</td>\n",
       "      <td>percent_change_sp500</td>\n",
       "      <td>close_MAX_prior_260_days</td>\n",
       "      <td>12</td>\n",
       "      <td>YES</td>\n",
       "      <td>FLOAT64</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>cloud-training-prod-bucket</td>\n",
       "      <td>ml4f</td>\n",
       "      <td>percent_change_sp500</td>\n",
       "      <td>close_AVG_prior_5_days</td>\n",
       "      <td>13</td>\n",
       "      <td>YES</td>\n",
       "      <td>FLOAT64</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>cloud-training-prod-bucket</td>\n",
       "      <td>ml4f</td>\n",
       "      <td>percent_change_sp500</td>\n",
       "      <td>close_AVG_prior_20_days</td>\n",
       "      <td>14</td>\n",
       "      <td>YES</td>\n",
       "      <td>FLOAT64</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>cloud-training-prod-bucket</td>\n",
       "      <td>ml4f</td>\n",
       "      <td>percent_change_sp500</td>\n",
       "      <td>close_AVG_prior_260_days</td>\n",
       "      <td>15</td>\n",
       "      <td>YES</td>\n",
       "      <td>FLOAT64</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>cloud-training-prod-bucket</td>\n",
       "      <td>ml4f</td>\n",
       "      <td>percent_change_sp500</td>\n",
       "      <td>close_STDDEV_prior_5_days</td>\n",
       "      <td>16</td>\n",
       "      <td>YES</td>\n",
       "      <td>FLOAT64</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>cloud-training-prod-bucket</td>\n",
       "      <td>ml4f</td>\n",
       "      <td>percent_change_sp500</td>\n",
       "      <td>close_STDDEV_prior_20_days</td>\n",
       "      <td>17</td>\n",
       "      <td>YES</td>\n",
       "      <td>FLOAT64</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>cloud-training-prod-bucket</td>\n",
       "      <td>ml4f</td>\n",
       "      <td>percent_change_sp500</td>\n",
       "      <td>close_STDDEV_prior_260_days</td>\n",
       "      <td>18</td>\n",
       "      <td>YES</td>\n",
       "      <td>FLOAT64</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>cloud-training-prod-bucket</td>\n",
       "      <td>ml4f</td>\n",
       "      <td>percent_change_sp500</td>\n",
       "      <td>close_values_prior_260</td>\n",
       "      <td>19</td>\n",
       "      <td>NO</td>\n",
       "      <td>ARRAY&lt;FLOAT64&gt;</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>cloud-training-prod-bucket</td>\n",
       "      <td>ml4f</td>\n",
       "      <td>percent_change_sp500</td>\n",
       "      <td>days_on_market</td>\n",
       "      <td>20</td>\n",
       "      <td>YES</td>\n",
       "      <td>INT64</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>cloud-training-prod-bucket</td>\n",
       "      <td>ml4f</td>\n",
       "      <td>percent_change_sp500</td>\n",
       "      <td>scaled_change</td>\n",
       "      <td>21</td>\n",
       "      <td>YES</td>\n",
       "      <td>FLOAT64</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>cloud-training-prod-bucket</td>\n",
       "      <td>ml4f</td>\n",
       "      <td>percent_change_sp500</td>\n",
       "      <td>s_p_scaled_change</td>\n",
       "      <td>22</td>\n",
       "      <td>YES</td>\n",
       "      <td>FLOAT64</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>cloud-training-prod-bucket</td>\n",
       "      <td>ml4f</td>\n",
       "      <td>percent_change_sp500</td>\n",
       "      <td>normalized_change</td>\n",
       "      <td>23</td>\n",
       "      <td>YES</td>\n",
       "      <td>FLOAT64</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>cloud-training-prod-bucket</td>\n",
       "      <td>ml4f</td>\n",
       "      <td>percent_change_sp500</td>\n",
       "      <td>company</td>\n",
       "      <td>24</td>\n",
       "      <td>YES</td>\n",
       "      <td>STRING</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>cloud-training-prod-bucket</td>\n",
       "      <td>ml4f</td>\n",
       "      <td>percent_change_sp500</td>\n",
       "      <td>industry</td>\n",
       "      <td>25</td>\n",
       "      <td>YES</td>\n",
       "      <td>STRING</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>cloud-training-prod-bucket</td>\n",
       "      <td>ml4f</td>\n",
       "      <td>percent_change_sp500</td>\n",
       "      <td>direction</td>\n",
       "      <td>26</td>\n",
       "      <td>YES</td>\n",
       "      <td>STRING</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 table_catalog table_schema            table_name  \\\n",
       "0   cloud-training-prod-bucket         ml4f  percent_change_sp500   \n",
       "1   cloud-training-prod-bucket         ml4f  percent_change_sp500   \n",
       "2   cloud-training-prod-bucket         ml4f  percent_change_sp500   \n",
       "3   cloud-training-prod-bucket         ml4f  percent_change_sp500   \n",
       "4   cloud-training-prod-bucket         ml4f  percent_change_sp500   \n",
       "5   cloud-training-prod-bucket         ml4f  percent_change_sp500   \n",
       "6   cloud-training-prod-bucket         ml4f  percent_change_sp500   \n",
       "7   cloud-training-prod-bucket         ml4f  percent_change_sp500   \n",
       "8   cloud-training-prod-bucket         ml4f  percent_change_sp500   \n",
       "9   cloud-training-prod-bucket         ml4f  percent_change_sp500   \n",
       "10  cloud-training-prod-bucket         ml4f  percent_change_sp500   \n",
       "11  cloud-training-prod-bucket         ml4f  percent_change_sp500   \n",
       "12  cloud-training-prod-bucket         ml4f  percent_change_sp500   \n",
       "13  cloud-training-prod-bucket         ml4f  percent_change_sp500   \n",
       "14  cloud-training-prod-bucket         ml4f  percent_change_sp500   \n",
       "15  cloud-training-prod-bucket         ml4f  percent_change_sp500   \n",
       "16  cloud-training-prod-bucket         ml4f  percent_change_sp500   \n",
       "17  cloud-training-prod-bucket         ml4f  percent_change_sp500   \n",
       "18  cloud-training-prod-bucket         ml4f  percent_change_sp500   \n",
       "19  cloud-training-prod-bucket         ml4f  percent_change_sp500   \n",
       "20  cloud-training-prod-bucket         ml4f  percent_change_sp500   \n",
       "21  cloud-training-prod-bucket         ml4f  percent_change_sp500   \n",
       "22  cloud-training-prod-bucket         ml4f  percent_change_sp500   \n",
       "23  cloud-training-prod-bucket         ml4f  percent_change_sp500   \n",
       "24  cloud-training-prod-bucket         ml4f  percent_change_sp500   \n",
       "25  cloud-training-prod-bucket         ml4f  percent_change_sp500   \n",
       "\n",
       "                    column_name  ordinal_position is_nullable       data_type  \\\n",
       "0                        symbol                 1         YES          STRING   \n",
       "1                          Date                 2         YES            DATE   \n",
       "2                          Open                 3         YES         FLOAT64   \n",
       "3                         Close                 4         YES         FLOAT64   \n",
       "4                tomorrow_close                 5         YES         FLOAT64   \n",
       "5            tomo_close_m_close                 6         YES         FLOAT64   \n",
       "6        close_MIN_prior_5_days                 7         YES         FLOAT64   \n",
       "7       close_MIN_prior_20_days                 8         YES         FLOAT64   \n",
       "8      close_MIN_prior_260_days                 9         YES         FLOAT64   \n",
       "9        close_MAX_prior_5_days                10         YES         FLOAT64   \n",
       "10      close_MAX_prior_20_days                11         YES         FLOAT64   \n",
       "11     close_MAX_prior_260_days                12         YES         FLOAT64   \n",
       "12       close_AVG_prior_5_days                13         YES         FLOAT64   \n",
       "13      close_AVG_prior_20_days                14         YES         FLOAT64   \n",
       "14     close_AVG_prior_260_days                15         YES         FLOAT64   \n",
       "15    close_STDDEV_prior_5_days                16         YES         FLOAT64   \n",
       "16   close_STDDEV_prior_20_days                17         YES         FLOAT64   \n",
       "17  close_STDDEV_prior_260_days                18         YES         FLOAT64   \n",
       "18       close_values_prior_260                19          NO  ARRAY<FLOAT64>   \n",
       "19               days_on_market                20         YES           INT64   \n",
       "20                scaled_change                21         YES         FLOAT64   \n",
       "21            s_p_scaled_change                22         YES         FLOAT64   \n",
       "22            normalized_change                23         YES         FLOAT64   \n",
       "23                      company                24         YES          STRING   \n",
       "24                     industry                25         YES          STRING   \n",
       "25                    direction                26         YES          STRING   \n",
       "\n",
       "   is_hidden is_system_defined is_partitioning_column  \\\n",
       "0         NO                NO                     NO   \n",
       "1         NO                NO                     NO   \n",
       "2         NO                NO                     NO   \n",
       "3         NO                NO                     NO   \n",
       "4         NO                NO                     NO   \n",
       "5         NO                NO                     NO   \n",
       "6         NO                NO                     NO   \n",
       "7         NO                NO                     NO   \n",
       "8         NO                NO                     NO   \n",
       "9         NO                NO                     NO   \n",
       "10        NO                NO                     NO   \n",
       "11        NO                NO                     NO   \n",
       "12        NO                NO                     NO   \n",
       "13        NO                NO                     NO   \n",
       "14        NO                NO                     NO   \n",
       "15        NO                NO                     NO   \n",
       "16        NO                NO                     NO   \n",
       "17        NO                NO                     NO   \n",
       "18        NO                NO                     NO   \n",
       "19        NO                NO                     NO   \n",
       "20        NO                NO                     NO   \n",
       "21        NO                NO                     NO   \n",
       "22        NO                NO                     NO   \n",
       "23        NO                NO                     NO   \n",
       "24        NO                NO                     NO   \n",
       "25        NO                NO                     NO   \n",
       "\n",
       "   clustering_ordinal_position  \n",
       "0                         None  \n",
       "1                         None  \n",
       "2                         None  \n",
       "3                         None  \n",
       "4                         None  \n",
       "5                         None  \n",
       "6                         None  \n",
       "7                         None  \n",
       "8                         None  \n",
       "9                         None  \n",
       "10                        None  \n",
       "11                        None  \n",
       "12                        None  \n",
       "13                        None  \n",
       "14                        None  \n",
       "15                        None  \n",
       "16                        None  \n",
       "17                        None  \n",
       "18                        None  \n",
       "19                        None  \n",
       "20                        None  \n",
       "21                        None  \n",
       "22                        None  \n",
       "23                        None  \n",
       "24                        None  \n",
       "25                        None  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%bigquery \n",
    "SELECT\n",
    "    * EXCEPT(is_generated, generation_expression, is_stored, is_updatable)\n",
    "FROM\n",
    "    `cloud-training-prod-bucket.ml4f`.INFORMATION_SCHEMA.COLUMNS\n",
    "WHERE\n",
    "    table_name = \"percent_change_sp500\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Most of the features, like `open` and `close` are pretty straightforward. The features generated using analytic functions, such as `close_MIN_prior_5_days` are best described using an example. Let's take the 6 most recent rows of data for IBM and reproduce the `close_MIN_prior_5_days` column. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bigquery\n",
    "SELECT \n",
    "    *\n",
    "FROM\n",
    "    `cloud-training-prod-bucket.ml4f.percent_change_sp500`\n",
    "WHERE\n",
    "    symbol = 'IBM'\n",
    "ORDER BY \n",
    "    Date DESC\n",
    "LIMIT 6"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For `Date = 2013-02-01` how did we arrive at `close_MIN_prior_5_days = 0.989716`? The minimum close over the past five days was `203.07`. This is normalized by the current day's close of `205.18` to get `close_MIN_prior_5_days = 203.07 / 205.18 = 0.989716`. The other features utilizing analytic functions were generated in a similar way. Here are explanations for some of the other features:\n",
    "\n",
    "* __scaled_change__: `tomo_close_m_close / close`\n",
    "* __s_p_scaled_change__: This value is calculated the same way as `scaled_change` but for the S&P 500 index. \n",
    "* __normalized_change__: `scaled_change - s_p_scaled_change` The normalization using the S&P index fund helps ensure that the future price of a stock is not due to larger market effects. Normalization helps us isolate the factors contributing to the performance of a stock_market.\n",
    "* __direction__: This is the target variable we're trying to predict. The logic for this variable is as follows: \n",
    "\n",
    "    ```sql\n",
    "    CASE \n",
    "        WHEN normalized_change < -0.01 THEN 'DOWN'\n",
    "        WHEN normalized_change > 0.01 THEN 'UP'\n",
    "        ELSE 'STAY'\n",
    "    END AS direction\n",
    "    ```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create classification model for `direction`\n",
    "\n",
    "In this example, your job is to create a classification model to predict the `direction` of each stock. Be creative! You can do this in any number of ways. For example, you can use BigQuery, Scikit-Learn, or AutoML. Feel free to add additional features, or use time series models.   \n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Establish a Simple Benchmark\n",
    "\n",
    "One way to assess the performance of a model is to compare it to a simple benchmark. We can do this by seeing what kind of accuracy we would get using the naive strategy of just predicting the majority class. Across the entire dataset, the majority class is 'STAY'. Using the following query we can see how this naive strategy would perform."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Direction</th>\n",
       "      <th>percentage</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>STAY</td>\n",
       "      <td>53.766049</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>UP</td>\n",
       "      <td>23.240681</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>DOWN</td>\n",
       "      <td>22.993271</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Direction  percentage\n",
       "0      STAY   53.766049\n",
       "1        UP   23.240681\n",
       "2      DOWN   22.993271"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%bigquery\n",
    "WITH subset as (\n",
    "    SELECT \n",
    "        Direction\n",
    "    FROM\n",
    "        `cloud-training-prod-bucket.ml4f.percent_change_sp500`\n",
    "    WHERE\n",
    "        tomorrow_close IS NOT NULL\n",
    ")\n",
    "SELECT \n",
    "    Direction,\n",
    "    100.0 * COUNT(*) / (SELECT COUNT(*) FROM subset) as percentage\n",
    "FROM\n",
    "    subset\n",
    "GROUP BY\n",
    "    Direction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So, the naive strategy of just guessing the majority class would have accuracy of around 54% across the entire dataset. See if you can improve on this. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Train Your Own Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: Write code to build a model to predict Direction"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
